作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

无人机场景下基于动态度量学习的跨视角地理定位

  • 发布日期:2025-04-07

无人机场景下基于动态度量学习的跨视角地理定位

  • Published:2025-04-07

摘要: 关于跨视角地理定位的既有研究主要聚焦于判断查询图像是否准确对应于预定义图集中某个特定地理位置。然而,这种研究模式往往忽略了地理空间中固有的大量多空间尺度结构信息。为了实现更为稳健的定位效果,模型不仅需要捕捉局部建筑细节,还需理解通过建筑群和环境特征体现的目标之间的空间关系,从而在不同空间尺度下提高定位的准确性。为应对这些挑战,我们提出了一项新任务,称为多空间尺度跨视角地理定位,并推出了专门为此任务构建的ML-Campus数据集。ML-Campus数据集包含多视角、多来源的建筑图像,并为每个图像标注了多空间尺度标签,以体现不同空间尺度下的关联性和连续性。基于该数据集,我们对现有的跨视角地理定位方法进行了实证评估,以此为基准衡量其在此背景下的性能表现。为了进一步提升模型性能,我们使用提出的CV-HAPPIER方法进行训练,以增强模型在不同空间尺度下的特征表示能力。大量在ML-Campus数据集上的实验结果表明,CV-HAPPIER显著提升了跨视角地理定位检索排名结果的空间鲁棒性。

Abstract: Previous studies on cross-view geo-localization have primarily focused on determining whether a query image matches a specific location within a predefined gallery. However, this approach often overlooks the rich multi-scale structural information present in geographic environments. For enhanced localization accuracy, it is crucial for a model to capture not only the fine-grained architectural details but also to comprehend the spatial relationships among various targets, including building clusters and environmental features, across different spatial scales. To tackle these challenges, we introduce a new task: multi-scale cross-view geo-localization. We also present the ML-Campus dataset, which is specifically designed for this purpose. The ML-Campus dataset comprises multi-view, multi-source building images, each annotated with detailed geographic labels at multiple levels. These annotations reflect the relationships and continuity across various spatial scales. Using this dataset, we perform an empirical evaluation of current cross-view geo-localization methods, establishing a benchmark for their performance in this novel context. To further enhance model performance, we use the proposed CV-HAPPIER method for training, aiming to improve the model feature representation across different spatial scales. Extensive experimental results on the ML-Campus dataset show that CV-HAPPIER significantly improves the spatial robustness of cross-view geo-localization retrieval rankings.